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We study an evolutionary game of chance in which the probabilities for different 
outcomes (e.g., heads or tails) depend on the amount wagered on those outcomes. 
The game is perhaps the simplest possible probabilistic game in which perception 
affects reality. By varying the 'reality map', which relates the amount wagered to 
the probability of the outcome, it is possible to move continuously from a purely 
objective game in which probabilities have no dependence on wagers, to a purely 
subjective game in which probabilities equal the amount wagered. The reality map 
can reflect self-reinforcing strategies or self-defeating strategies. In self-reinforcing 
games, rational players can achieve increasing returns and manipulate the outcome 
probabilities to their advantage; consequently, an early lead in the game, whether 
acquired by chance or by strategy, typically gives a persistent advantage. We in- 
vestigate the game both in and out of equilibrium and with and without rational 
players. We introduce a method of measuring the inefficiency of the game, and show 
that the inefficiency decreases slowly in the approach to equilibrium (for large t it is 
a power law with ^ 7 ^ 1, depending on the subjectivity of the game). 



I. INTRODUCTION 
A. Motivation 

To capture the idea that objective out- 
comes depend on subjective perception 
Keynes used the metaphor of a beauty con- 
test in which the goal of the judges is not to 
decide who is most beautiful, but rather to 
guess which contestant will receive the most 
votes from the other judges [12]. Economic 
problems typically have both purely objective 
components, e.g., how much revenue a com- 
pany creates, as well as subjective compo- 
nents, e.g., how much revenue investors col- 
lectively think it will create. The two are in- 
extricably linked: subjective perceptions al- 
ter investment patterns, which affect objec- 
tive outcomes, which in turn affect subjective 
perceptions. 

To study this problem this paper intro- 
duces a simple probabilistic game in which 
the probability of outcomes depends on the 



amount bet on those outcomes. The game in- 
troduces the idea of a 'reality map' that me- 
diates between subjective perception and ob- 
jective outcome. This goes beyond Keynes, 
in that subjective perception actually alters 
the reality (in his example the faces of the 
contestants). The form of the reality map can 
be tuned to move continuously from purely 
objective to purely subjective games, and dif- 
ferent levels of feedback from perception to 
reality are easily classified and studied. 1 

Consider a probabilistic event, such as a 
coin toss or the outcome of a horse race. 
Now suppose that the odds of the outcomes 
of the event depend on the amount wagered 
on them. In the case of a coin toss, this 
means that the probability of heads is a func- 
tion (the 'reality map') of the amount bet on 
heads. For a purely objective event, such as 



The results in this paper are presented in more 
detail in reference [7] . 
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the toss of a fair coin, the reality map is sim- 
ple: the probability of heads is 1/2, indepen- 
dent of the amount bet on it. But most events 
are not fully objective. In the case of a horse 
race, for instance, a jockey riding a strongly 
favored horse may make more money if he se- 
cretly bets on the second most favored horse 
and then intentionally loses the race. This is 
an example of a self-defeating map from per- 
ception to reality: if jockeys misbehave, then 
as the horse becomes more popular, the ob- 
jective probability that it will win decreases. 
Alternatively, in an economic setting, if peo- 
ple like growth strategies they will invest in 
companies whose prices are going up, which 
in turn drives prices further up. This is an 
example of a self-reinforcing reality map. 

B. Review of related work 

There has been considerable past work on 
situations where subjective factors influence 
objective outcomes. Some examples include 
Hommes's studies of cobweb models [9, 10], 
studies of increasing returns [2], Arthur's El 
Farol model and its close relative the minor- 
ity game [3, 6], Blume and Easley's model 
of the influence of capital markets on natural 
selection in an economy [4, 5], and Akiyama 
and Kaneko's example of a game that changes 
due to the players' behaviors and states [1]. 
The model we introduce here has the advan- 
tage of being very general yet very simple, 
providing a tunable way to study this phe- 
nomenon under varying levels of feedback. 

II. GAME DEFINITION 

A. Wealth dynamics 

Let N agents place wagers on L possible 
outcomes. In the case of betting on a coin, 
for example, there are two outcomes, heads 
and tails. Let su be the fraction of the i-th 
player's wealth Wi that is wagered on the l- 
th outcome. The vector (s a , . . . , su) is the 



i-th player's strategy, and pu = suWi is the 
amount of money bet on the Z-th outcome by 
player i. Let pi = Yl,iPii be the total wager 
on the Z-th outcome. If the winning outcome 
is I = A, the payoff 7r iA to player i is pro- 
portional to the amount that player bets and 
inversely proportional to the total amount ev- 
eryone bets, i.e. 

PiX S iX Wi 
TTiA = = • 

Px Px 

This corresponds to what is commonly called 
pari-mutuel betting. We assume no "house 
take", i.e. a fair game. Assume J2 t su = 1, 
i.e. that each player bets all her money at 
every iteration of the game, typically betting 
non-zero amounts on each possible outcome. 
The total wealth is conserved and is normal- 
ized to sum to one, 

^Wi = ^2pu = 1 • 

i i,l 

We will call qi the probability of outcome 
/, where = 1- The expected payoff 

is 

I 

If the vector q is fixed, after playing the game 
repeatedly for t rounds the wealth updating 
rule 

(*) 

w (t+i) = s ixw; 

1 Px 

is equivalent to Bayesian inference, where 
the initial wealth wf^ is interpreted as the 
prior probability that q\ = Si\ and the fi- 
nal wealth wf is its posterior probability. 
In Bayesian inference, models whose predic- 
tions match the actual probabilities of out- 
comes accrue higher a posteriori probability 
as more and more events occur. Here players 
whose strategies more closely match actual 
outcome probabilities accrue wealth on aver- 
age at the expense of players whose strategies 
are a worse match. 
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B. Strategies 

We first study fixed strategies. For con- 
venience we restrict the possible number of 
outcomes to L — 2, so that we can think 
of this as a coin toss with possible outcomes 
heads and tails. Because the players are re- 
quired to bet all their money on every round, 
Sn + Si2 = 1, we can simplify the notation and 
let Si = sa be the amount bet on heads - 
the amount bet on tails is determined auto- 
matically. Similarly q = q\ and p = p\. The 
space of possible strategies corresponds to the 
unit interval [0,1]. We will typically simu- 
late N fixed strategies, Sj = i/(N — 1), where 
i — 0, 1, . . . , N — 1. Later on we will also add 
rational players, who know the strategies of 
all other players and dynamically adapt their 
own strategies to maximize a utility function. 

C. Reality maps 

The game definition up to this point fol- 
lows the game studied by Cover and Thomas 
[8] . We generalize their game by allowing for 
the possibility that the objective probability 
q for heads is not fixed, but rather depends 
on the net amount p wagered on heads. The 
reality map q(p), where ^ q{p) ^ 1, fully 
describes the relation between bets and out- 
comes. We restrict the problem slightly by re- 
quiring that g(l/2) = 1/2. We do this to give 
the system the chance for the objective out- 
come, as manifested by the bias of the coin, 
to remain constant at q = 1/2. We begin 
by studying the case where q{p) is a mono- 
tonic function, which is either nondecreasing 
or nonincreasing. Letting q'{p) = dq/dp, we 
distinguish the following possibilities: 

• Objective. q{p) = 1/2, i.e. it is a fair 
coin independent of the amount wa- 
gered. (Other values of q = constant 
are qualitatively similar to q = 1/2.) 

• Self-defeating. q'(p) < 0, e.g. q{p) = 
1 — p. In this case the coin tends to 
oppose the collective perception, e.g. 



if people collectively bet on heads, the 
coin is biased toward tails. 

• Self-reinforcing. q'(p) > 0. The coin 
tends to reflect the collective percep- 
tion, e.g. if people collectively bet on 
heads, the coin becomes more biased 
toward heads. A special case of this is 
purely subjective, i.e. q{p) = p, in which 
the bias simply reflects people's bets. 

It is convenient to have a one parameter 
family of reality maps that allows us to tune 
from objective to self-reinforcing. We choose 
the family 

, . 1 1 ira(p - I) 

Qa(P) = — I — arctan ^-—r . (1) 

H yF! 2 vr 1 - (2p- l) 2 V ; 

The parameter a is the slope at p = 1/2. 
When a = 0, q(p) is constant (purely objec- 
tive), and when a > 0, q(p) is self-reinforcing. 
The derivative g^(l/2) is an increasing func- 
tion of a; when a = 1, g'(l/2) = 1, and q(p) 
is close to the identity map. 2 We study the 
self-defeating case separately using the map 
q(p) = l-p. 




FIG. 1: q a {p) with different values of parameter: 
(a) a = 1/3; (b) a = 1; (c) a = 3. 



2 An inconvenient aspect of this family is that q a (p) 
does not contain the function q(p) — p. However, 
qi(p) is very close to q(p) = p (the difference does 
not exceed 0.012, with the average value less than 
half of this). Still, to avoid any side effects, we 
study the purely subjective case using q{p) = p. 
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III. DYNAMICS OF THE 
OBJECTIVE BIAS 

In this section we study the dynamics of 
the objective bias of the coin, which is the 
tangible reflection of "reality" in the game. 
This also allows us to get an overview of 
the behavior for different reality maps q(p). 
We use N = 29 agents each playing one of 
the 29 equally spaced strategies on (0,1): 
1/30,2/30, ...,29/30, and begin by giving 
them all equal wealth. We then play the 
game repeatedly and plot the bias of the coin 
as a function of time. This is done sev- 
eral times to get a feeling for the variability 
vs. consistency of the behavior of different re- 
ality maps, as shown in Figure 2. 

For the purely objective case q(p) = 1/2, 
the result is trivial. For the self-defeating 
case, q{p) = 1 — p, the results become more 
interesting, as shown in (a). Initially the bias 
of the coin varies considerably, with a range 
that is generally about 0.3 - 0.7, but it even- 
tually settles into a fixed point at q — 1/2. 
For this case the bias tends to oscillate back 
and forth as it approaches its equilibrium 
value. Suppose, for example, that the first 
coin toss yields heads; after this toss, play- 
ers who bet more on heads possess a major- 
ity of the wealth. At the second toss, be- 
cause of the self-defeating nature of the map, 
the coin is biased towards tails. As a result, 
wealth tends to shift back and forth between 
heads and tails players before finally accruing 
to players who play the 'sensible' unbiased 
strategy. 

We then move to the weakly self- 
reinforcing case using equation (1) with a = 
1/2, as shown in (b). The behavior is similar 
to the previous case, except that the fluctu- 
ations of are now larger. At the end of 
2000 rounds of the game, the bias is much 
less converged on q = 1/2. The bias is also 
strongly autocorrelated in time — if the bias 
is high at a given time, it tends to remain 
high at subsequent times. (This was already 
true for the self-defeating case, but is more 




FIG. 2: The objective bias of the coin gw as 
a function of time, for different random number 
seeds, (a) q(p) = 1 - p; (b) in equation (1), 
a = 1/2; (c) q(p) = p; (d) a = 1.5; (e) q(p) = 3p 
mod 1. (a)-(d) 50 runs up to t = 2000; (e) 100 
runs up to t = 10000. 



pronounced here). Although this is not obvi- 
ous from this figure, after a sufficiently long 
period of time all trajectories eventually con- 
verge to q = 1/2. 

Next we study the purely subjective case, 
q(p) = p, as shown in (c). In this case the bias 
fluctuates wildly in the early rounds of the 
game, but it eventually converges to one of 
the strategies s,, corresponding to the player 
who eventually ends up with all the wealth. 

As we increase a > 1, as shown in (d), the 
instability becomes even more pronounced. 
The bias initially fluctuates near q = 1/2, 
but it rapidly diverges to fixed points either 
at q = or q = 1. Which of the two fixed 
points is chosen depends on the random val- 
ues that emerge in the first few flips of the 
coin; initially the coin is roughly fair, but as 
soon as a bias begins to develop, it is rapidly 
reinforced and it locks in. The extreme case 
occurs when q{p) is a step function, q{p) = 
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for < p < 1/2, q(l/2) = 1/2 and q(p) = 1 
for 1/2 < p ^ 1. In this case the first coin 
flip determines the future dynamics entirely: 
If the first coin flip is heads, then players who 
favor heads gain wealth relative to those who 
favor tails, and the coin forever after yields 
heads, until all the wealth is concentrated 
with the player that bets most heavily on 
heads. (And vice versa for tails). 

Finally in (e) we show an example of 
the bias dynamics for the multi-modal map 
q(p) = 3p mod 1. In this case the bias oscil- 
lates between q — or q — 1, with a variable 
period that is the order of a few hundred it- 
erations. We explain this behavior at the end 
of the next section. 



IV. WEALTH DYNAMICS 

How do the wealths of individual players 
evolve as a function of time? The purely ob- 
jective case q = constant with fixed strate- 
gies and using a bookmaker instead of pari- 
mutuel betting was studied by Kelly [11] and 
summarized by Cover and Thomas [8]. As- 
suming all the strategies are distinct, they 
show that the agent with the strategy clos- 
est to q asymptotically accumulates all the 
wealth. Here "closeness" is defined in terms 
of the Kullback-Leibler distance between the 
strategy su and the true probability q\. 

For all reality maps q{p) that we have 
studied we find that one player asymptoti- 
cally accumulates nearly all the wealth. As 
a particular player becomes more wealthy, 
it becomes less and less likely that another 
player will ever overtake this player. This 
concentration of wealth in the hands of a sin- 
gle player is the fundamental fact driving the 
convergence of the objective bias dynamics to 
a fixed point, as observed in the previous sec- 
tion. The reason is simple: once one player 
has all the wealth, this player completely de- 
termines the odds, and since her strategy is 
fixed, she always places the same bets. 

It is possible to compute the distribution 
of wealth after t steps in closed form for the 



purely subjective case, q(p) = p. The proba- 
bility that heads occurs m times in t steps is 
a sum of binomial distributions, weighted by 
the initial wealths vJf* of the players, 



P (t) 



N 



rn 



\t—m 



and the corresponding wealth of player i is 



w 



(t) 



3 3 



When the initial wealths are evenly dis- 
tributed among the players, no player has an 
advantage over any other. However, as soon 
as the first coin toss happens, the distribution 
of wealth becomes uneven. Wealthier players 
have an advantage because they have a big- 
ger influence on the odds, so the coin tends 
to acquire a bias corresponding to the strate- 
gies of the dominant (i.e. initially lucky) play- 
ers. Figure 3 shows the probability Pm for 
t = 1000 and t = 10 5 . After 1000 steps the 
binomial distributions are still strongly over- 
lapping, and there is still a reasonable chance 
to overtake the winning strategy. After 10 5 
steps, however, the bias of the coin has locked 
onto an existing strategy Sj, due to the fact 
that this strategy has almost all the wealth. 
Once this happens, the probability that this 
will ever change is extremely low. 

We now explain the peculiar bias dynam- 
ics observed for the multi-modal map q{p) = 
3p mod 1 in Figure 2(e), in which the bias 
of the coin oscillates wildly between and 
1. Through the passage of time the wealth 
becomes concentrated on strategies near ei- 
ther s = 1/3 or s = 2/3, corresponding to 
the discontinuities of q{p). Suppose, for ex- 
ample, that p = Y2i w i s i i s slightly greater 
than 1/3, where the q{p) map is close to 
zero. This causes a transfer of wealth to- 
ward strategies with smaller values of s until 
p = J2i w i s i < 1/3. At this point the bias 
of the coin flips because q{p) is now close to 
one and the transfer of wealth reverses to fa- 
vor strategies with higher values of s. Due to 
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FIG. 3: The probability that heads occurs m 
times in t rounds of the game with q(p) = p, 
assuming uniform initial wealth, for (a) t = 1000 
and (b) t = 10 5 . In (b) note that, although 
some peaks appear higher than others, the total 
weight of each peak is the same. 



fluctuations in the outcomes of the coin toss 
this oscillation is not completely regular. It 
continues indefinitely, even though with the 
passage of time wealth becomes concentrated 
more and more tightly around s — 1/3. A 
similar process occurs if the first coin tosses 
cause convergence around s = 2/3. We dis- 
cuss the initial convergence around s = 1/3 
or s = 2/3 in more detail in the next section. 



V. RATIONAL PLAYERS AND 
NASH EQUILIBRIA 

So far we have studied the evolution with 
a fixed set of strategies. What happens when 
we instead consider rational players? What 
are the Nash equilibria? We will see that 
while the general situation here is quite com- 
plicated, we can nonetheless get some ana- 
lytic insight into the attractors of the game 
with fixed strategies. Furthermore, we will 
argue that if one is interested in surviving 
to dominate the game, the choice of utility 
function is not arbitrary. 

The first question that must be addressed 
is, 'What would rational players reasonably 
optimize?". The answer is not obvious. For 
example, suppose a player maximizes the ex- 
pected payoff on the next step. Assuming 
there is higher expectation for the coin to 
yield heads, the strategy that maximizes the 



expected payoff is S{ = 1, i.e. betting all the 
player's wealth on heads. However, this gives 
a non-zero probability of bankruptcy, and in 
the long run guarantees that this player's 
wealth will asymptotically be zero. This illus- 
trates the need for risk aversion, and the need 
to look ahead more than one move. In gen- 
eral looking ahead is very complicated, due 
to the fact that each player's move affects not 
only the player's expected wealth on the next 
step, but also the future bias of the coin, and 
hence the future wealths of all other players. 

For the case q = constant it is possible to 
show that the strategy that asymptotically 
accumulates all the wealth maximizes the ex- 
pected log-return [8] 
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(t+i) 



Elog 



IV, 



Elog 
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(t) 



(*) 



Pi ' 



For q = constant and fixed strategies, maxi- 
mizing the log-return repeatedly for one step 
is equivalent to maximizing it over a long 
time horizon. For a general reality map q(p), 
however, this is no longer sufficient; it is easy 
to produce examples for which two step opti- 
mization produces different results than sin- 
gle step optimization, due to the effect of 
changes in the strategy's wealth on the fu- 
ture bias of the coin. For the case where the 
strategy's wealth is negligible, however, the 
situation is greatly simplified: It is possible 
to show that maximizing the one step log- 
return at each step is equivalent to maximiz- 
ing the log- return over many steps [7]. In a 
game with many players, each of whom has 
small wealth, maximizing the log-return for 
the next step may be a good approximation 
to the optimal strategy. As we have already 
seen, acquiring a lead in the early stages of 
the game gives an advantage that tends to 
persist later on. 

The expected log return for one step can 
be written 

r(s) = q log — h (1 — q) log ■ 



P 



:i-p) 
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where as before q = qx, s = s±, w = Wx, etc. 
The first derivative is 



dr 
ds 



+q 



q w 



1 w 



log s — log p— log ( 1 — s ) +log ( 1 — p) 

1 



V 



w 



1 — s 1 — p J ' 



where g' = dq/dp. A sufficient condition for 
dr/ds = is s = q = p. In this case the 
second derivative is 



d 2 r 
ds^ 



w 



si 



2wq' — (1 + w) 



For the map q a the strategy s = 1/2 is a lo- 
cal maximum when a < (1 + w)/(2w). Pro- 
viding this condition holds, this implies that 
this strategy is what we will call a myopic log 
Nash equilibrium, i.e., it is a strategy that, 
when played against itself, gives the best pos- 
sible expected log return for the next round of 
the game. The myopic log Nash equilibria de- 
pend on the reality map q(p), but in general 
they also depend on the wealth of the players, 
so that they can be dynamic, shifting with 
each coin toss. For example, when a < 1, 
s = 1/2 is always a myopic log Nash equilib- 
rium, but when a > 1 the myopic log Nash 
equilibria strongly depend on the wealth of 
the players. This makes long-range optimiza- 
tion difficult. 

This is illustrated in Figure 4, where we 
show the expected log return for a player 
(arbitrarily labeled "first" ) playing a myopic 
log Nash strategy against another (second) 
player using a fixed strategy with s — 1/2. 
When the first player's wealth is low, 1/2 
is the optimal strategy, and it is a myopic 
Nash equilibrium. As the first player gains in 
wealth two strategies on either side of s = 1/2 
become superior; as wealth increases, these 
strategies become more and more separated 
from s = 1/2, and in the limit as w — > 1 the 
optimal strategy is either s — or s = 1. 

In the limit as the first player's wealth gets 
large the possible myopic log Nash equilib- 
ria do a good job of predicting the attractors 




FIG. 4: The expected log-return r(s) for a player 
(arbitrarily labeled "first") using a myopic log 
Nash strategy against a player using a fixed 
strategy s = 1/2, where q(p) = q a with a = 2. 
This is plotted against the first player's wealth 
on the x axis and the first player's optimal bet 
on heads on the y axis. 



of the objective dynamics shown in Figure 2. 
For the self-defeating case, q{p) = 1—p, there 
is a unique myopic log Nash equilibrium at 
s = 1/2, corresponding to the unique attrac- 
tor for q. For the self-reinforcing case with 
a < 1 this is also true. For the identity map 
q(p) = p the entire interval ^ s ^ 1 is a 
myopic log Nash equilibrium. When a 1 
either s = or 1 can be myopic log Nash 
equilibria. In each case these correspond to 
the attractors of the wealth dynamics. 

The multimodal map q(p) = 3p mod 1 is 
interesting because all the intersections with 
the identity are local minima for the expected 
log return. Instead, the system is attracted to 
the discontinuities of the map at s = 1/3 and 
s = 2/3. It is as if one can think of the discon- 
tinuities of the map as being connected (with 
infinite slope), creating intersections with the 
identity that yield local maxima of the log re- 
turn. While we do not have a formal method 
of proving this, we gave an intuitive expla- 
nation for how this stability comes about in 
Section IV. 
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VI. EFFICIENCY 

As the game is played the reallocation of 
wealth causes the population of players to be- 
come more efficient in the sense that there 
are poorer profit opportunities available for 
optimal players. This is analogous to finan- 
cial markets, where wealth reallocation due 
to profitable vs. unprofitable trading has long 
been hypothesized to result in a market that 
is efficient in the sense that all excess profit- 
making potential has been removed. In finan- 
cial economics efficiency is taken postu- 
late; there is no theory that guarantees that 
markets out of equilibrium will always be at- 
tracted to a perfectly efficient market equilib- 
rium, and no way to quantitatively measure 
the inefficiency of a market when it is out of 
equilibrium. Our game provides a simple set- 
ting to study the approach to efficiency in an 
out-of-equilibrium context. 

We can measure the inefficiency of our 
game based on the returns of what we will 
call a rational e player. This player knows 
the strategies of all other players, and pur- 
sues an optimal strategy that maximizes her 
expected log returns. This player has in- 
finitesimal wealth e, so that her actions have 
a negligible effect on the outcome of the 
game. In the purely objective setting where 
q = constant the approach to efficiency is 
guaranteed by the fact that the wealth dy- 
namics are formally equivalent to Bayesian 
updating, implying all the wealth converges 
on the correct hypothesis about the bias of 
the coin. For more general settings this is no 
longer obvious, as there is no longer such a 
thing as an objectively correct hypothesis. 

We have studied the approach to efficiency 
numerically for a variety of different reality 
maps as shown in Figure 5. To damp out the 
effect of statistical fluctuations from run to 
run we take an ensemble average by varying 
the random number seed. For q(p) = p we 
find that the inefficiency is essentially zero 
at all times. In every other case we find that 
the efficiency is a decreasing function of time, 



asymptotically converging as a power law t~ 7 
with 0^7^ 1. For the self defeating case 
q(p) = 1 — p we observe 7 « 1; for other 
values of a we observe 7 < 1. For example, 
for a = 0.5, 7 ph 0.6, for a = 1.5, 7 ss 0.25, 
and for a = 2, 7 « 0.5. For maps close to 
the purely subjective case, the inefficiency is 
initially quite small but convergence is corre- 
spondingly slow. For example, compare 5 (a) 
and (b). The self-defeating case is initially 
much more inefficient than the mildly self- 
reinforcing case, but by t = 10, 000 the situ- 
ation is reversed. The rate of convergence in 
efficiency reflects the slow convergence in the 
bias of the coin to its fixed point attractor. 3 

(b) ci=0.5 
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FIG. 5: The inefficiency of the game is measured 
by the log returns of an optimal e player, shown 
as a function of time using N = 3000 players. 
Plots are in double logarithmic scale, (a) q(p) = 
1 - p; (b) a = 0.5; (c) a = 1.5; (d) a = 2.0. 



3 Wc have performed simulations with different num- 
bers of agents and find that domain of validity of 
the power law scaling is truncated for small N (e.g. 
for N w 30 it extends only to roughly t w 1000). 
The length of validity of the power law scaling in 
time increases with TV. Thus there is a finite size 
effect, indicating that the power law scaling is ex- 
actly valid only in the limit as N — > 00 and t — > 00 . 
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VII. INCREASING RETURNS 

When the reality map for a game of chance 
is purely objective a player can't manipulate 
the outcome unless she cheats. In contrast, 
for more general reality maps, under some 
circumstances the player can use the subjec- 
tive dependence of q(p) to manipulate the ob- 
jective odds to her advantage. This manifests 
itself as increasing returns to scale — as the 
player acquires more wealth, the ability to 
manipulate the odds increases, and the ex- 
pected return also increases. 

To measure this we study a rational strat- 
egy that exploits its complete knowledge of 
the strategies of the other players to max- 
imize the expected log-return for the next 
step. We start with given wealth assignments 
for the players and vary the fraction of the 
wealth for the rational player vs. the other 
players. Under some circumstances, which 
depend both on q{p) and the distribution of 
wealth, for self-reinforcing reality maps we 
find that the returns are an increasing func- 
tion of the wealth of the rational player. Two 
examples are given in Figure 6. 

VIII. SUMMARY 

We have introduced a very simple evolu- 
tionary game of chance with the nice prop- 
erty that one can explicitly study the influ- 
ence of the player's actions on the outcome of 
the game. By altering the reality map q{p) 
it is possible to continuously vary the set- 
ting from completely objective, i.e. the odds 
are independent of the players' actions, to 
completely subjective, i.e. the odds are com- 
pletely determined by the players' actions. 

This is an evolutionary game in the strong 
sense: Only one player survives to have non- 
negligible wealth. Our results suggest that 
the myopic maximization of expected log re- 
turns is a fairly good survival strategy, cer- 
tainly much better than simply maximizing 
returns. However, in contrast to the purely 
objective case, it is provably not an optimal 




FIG. 6: The left column shows the expected log 
returns r(w) for an optimal player and the right 
column shows the corresponding optimal strate- 
gies s(w), both as a function of her wealth w. 

(7) 

In the top row a = 1/2 and t = 7, where w\ 
is produced by starting at = 1/N and gen- 
erating 7 heads, where N = 29. In the bottom 
row a = 2, t = 0, and wf^ = 1/N. 



strategy. This is due to the complications in- 
duced by the feedback between the success 
of the players and the objective reality as re- 
flected in the bias of the coin. 

It has long been known that subjective ef- 
fects can play an important role in games, 
causing problems such as increasing returns 
and lock-in to a particular outcome due to 
chance events. This model shows that the 
existence of subjective effects alone are not 
enough. Instead, for most of the proper- 
ties we study here, such as selecting between 
alternative equilibria or increasing returns, 
we need the self-reinforcing effects to be suf- 
ficiently strong to destabilize the objective 
dynamics. There is a competition between 
the stabilizing force of wealth concentration 
and the destabilizing properties of q(p) when 
q' > 0. This game shows how these effects be- 
come steadily stronger as the self-reinforcing 
nature of the reality map increases. It also 
shows that these effects are generally com- 
plicated and wealth dependent, even in this 
simple situation. The myopic log Nash equi- 
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libria are strongly wealth dependent. Since 
wealth evolves through time, the myopic log 
Nash equilibria also evolve in a time depen- 
dent manner. 

This game provides a setting in which to 
study the progression toward efficiency in an 
out-of-equilibrium context. We have intro- 
duced a notion of efficiency that closely re- 
sembles arbitrage efficiency in financial mar- 
kets. We always observe a progression toward 
efficiency, except for the purely subjective 
case, in which it appears that any configu- 
ration of player strategies automatically pro- 
duces an efficient market. This isn't surpris- 
ing, since in the purely subjective case there 
is no preferred strategy. In every other case, 
as wealth is reallocated, the game becomes 
more efficient, in the sense that there are 
fewer profit-making opportunities for skillful 
players. For the examples we observe that 
the inefficiency as a function of time asymp- 
totically decreases as a power law. 

One might consider several extensions of 
the problem studied here. For example, one 
could study learning (see e.g. [13]). Another 
interesting possibility is to allow more gen- 
eral reality maps, in which q is a multidi- 
mensional function with a multidimensional 



argument that may depend on the bets of in- 
dividual players. For example, an interesting 
case is to allow some players, who might be 
called pundits, to have more influence on the 
outcome than others. It would also be very 
interesting to modify the game so that it is 
an open system, e.g. relaxing the wealth con- 
servation condition and allowing external in- 
puts. This may prevent the asymptotic con- 
vergence of all the wealth to a single player, 
creating more interesting long-term dynam- 
ics. 
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